manual annotation
- Oceania > New Zealand (0.04)
- North America > United States > Colorado (0.04)
- Research Report (1.00)
- Workflow (0.67)
- Media > Film (1.00)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Law (1.00)
- (13 more...)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Communications > Networks (1.00)
- (5 more...)
Are generative AI text annotations systematically biased?
Stolwijk, Sjoerd B., Boukes, Mark, Trilling, Damian
This paper investigates bias in GLLM annotations by conceptually replicating manual annotations of Boukes (2024). Using various GLLMs (Llama3.1:8b, Llama3.3:70b, GPT4o, Qwen2.5:72b) in combination with five different prompts for five concepts (political content, interactivity, rationality, incivility, and ideology). We find GLLMs perform adequate in terms of F1 scores, but differ from manual annotations in terms of prevalence, yield substantively different downstream results, and display systematic bias in that they overlap more with each other than with manual annotations. Differences in F1 scores fail to account for the degree of bias.
- Europe > Netherlands > North Holland > Amsterdam (0.05)
- Asia > Middle East > Jordan (0.04)
Are LLMs Truly Multilingual? Exploring Zero-Shot Multilingual Capability of LLMs for Information Retrieval: An Italian Healthcare Use Case
Kembu, Vignesh Kumar, Morandini, Pierandrea, Ranzini, Marta Bianca Maria, Nocera, Antonino
Large Language Models (LLMs) have become a key topic in AI and NLP, transforming sectors like healthcare, finance, education, and marketing by improving customer service, automating tasks, providing insights, improving diagnostics, and personalizing learning experiences. Information extraction from clinical records is a crucial task in digital healthcare. Although traditional NLP techniques have been used for this in the past, they often fall short due to the complexity, variability of clinical language, and high inner semantics in the free clinical text. Recently, Large Language Models (LLMs) have become a powerful tool for better understanding and generating human-like text, making them highly effective in this area. In this paper, we explore the ability of open-source multilingual LLMs to understand EHRs (Electronic Health Records) in Italian and help extract information from them in real-time. Our detailed experimental campaign on comorbidity extraction from EHR reveals that some LLMs struggle in zero-shot, on-premises settings, and others show significant variation in performance, struggling to generalize across various diseases when compared to native pattern matching and manual annotations.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Italy > Lombardy > Milan (0.04)
- Health & Medicine > Health Care Technology > Medical Record (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.31)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Med-CRAFT: Automated Construction of Interpretable and Multi-Hop Video Workloads via Knowledge Graph Traversal
Liu, Shenxi, Li, Kan, Zhao, Mingyang, Tian, Yuhang, Zhou, Shoujun, Li, Bin
The scarcity of high-quality, logically annotated video datasets remains a primary bottleneck in advancing Multi-Modal Large Language Models (MLLMs) for the medical domain. Traditional manual annotation is prohibitively expensive and non-scalable, while existing synthetic methods often suffer from stochastic hallucinations and a lack of logical interpretability. To address these challenges, we introduce \textbf{\PipelineName}, a novel neuro-symbolic data engineering framework that formalizes benchmark synthesis as a deterministic graph traversal process. Unlike black-box generative approaches, Med-CRAFT extracts structured visual primitives (e.g., surgical instruments, anatomical boundaries) from raw video streams and instantiates them into a dynamic Spatiotemporal Knowledge Graph. By anchoring query generation to valid paths within this graph, we enforce a rigorous Chain-of-Thought (CoT) provenance for every synthesized benchmark item. We instantiate this pipeline to produce M3-Med-Auto, a large-scale medical video reasoning benchmark exhibiting fine-grained temporal selectivity and multi-hop logical complexity. Comprehensive evaluations demonstrate that our automated pipeline generates query workloads with complexity comparable to expert-curated datasets. Furthermore, a logic alignment analysis reveals a high correlation between the prescribed graph topology and the reasoning steps of state-of-the-art MLLMs, validating the system's capability to encode verifiable logic into visual-linguistic benchmarks. This work paves the way for scalable, low-cost construction of robust evaluation protocols in critical domains.
- Asia > China > Beijing > Beijing (0.05)
- Asia > China > Guangdong Province > Shenzhen (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
- Health & Medicine > Health Care Technology (0.48)
- Health & Medicine > Surgery (0.34)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.88)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.63)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Computational frame analysis revisited: On LLMs for studying news coverage
Kunjar, Sharaj, Smith, Alyssa Hasegawa, Mckenzie, Tyler R, Mohbe, Rushali, Scarpino, Samuel V, Welles, Brooke Foucault
Computational approaches have previously shown various promises and pitfalls when it comes to the reliable identification of media frames. Generative LLMs like GPT and Claude are increasingly being used as content analytical tools, but how effective are they for frame analysis? We address this question by systematically evaluating them against their computational predecessors: bag-of-words models and encoder-only transformers; and traditional manual coding procedures. Our analysis rests on a novel gold standard dataset that we inductively and iteratively developed through the study, investigating six months of news coverage of the US Mpox epidemic of 2022. While we discover some potential applications for generative LLMs, we demonstrate that they were consistently outperformed by manual coders, and in some instances, by smaller language models. Some form of human validation was always necessary to determine appropriate model choice. Additionally, by examining how the suitability of various approaches depended on the nature of different tasks that were part of our frame analytical workflow, we provide insights as to how researchers may leverage the complementarity of these approaches to use them in tandem. We conclude by endorsing a methodologically pluralistic approach and put forth a roadmap for computational frame analysis for researchers going forward.
- North America > United States > Minnesota (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (11 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Media > News (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- (5 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > Wales > Ceredigion > Aberystwyth (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
Automated Segmentation of Coronal Brain Tissue Slabs for 3D Neuropathology
Ramirez, Jonathan Williams, Zemlyanker, Dina, Deden-Binder, Lucas, Herisse, Rogeny, Pallares, Erendira Garcia, Gopinath, Karthik, Gazula, Harshvardhan, Mount, Christopher, Kozanno, Liana N., Marshall, Michael S., Connors, Theresa R., Frosch, Matthew P., Montine, Mark, Oakley, Derek H., Mac Donald, Christine L., Keene, C. Dirk, Hyman, Bradley T., Iglesias, Juan Eugenio
Advances in image registration and machine learning have recently enabled volumetric analysis of postmortem brain tissue from conventional photographs of coronal slabs, which are routinely collected in brain banks and neuropathology laboratories worldwide. One caveat of this methodology is the requirement of segmentation of the tissue from photographs, which currently requires costly manual intervention. In this article, we present a deep learning model to automate this process. The automatic segmentation tool relies on a U-Net architecture that was trained with a combination of 1,414 manually segmented images of both fixed and fresh tissue, from specimens with varying diagnoses, photographed at two different sites. Automated model predictions on a subset of photographs not seen in training were analyzed to estimate performance compared to manual labels, including both inter- and intra-rater variability. Our model achieved a median Dice score over 0.98, mean surface distance under 0.4mm, and 95\% Hausdorff distance under 1.60mm, which approaches inter-/intra-rater levels. Our tool is publicly available at surfer.nmr.mgh.harvard.edu/fswiki/PhotoTools.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (4 more...)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)